Three Stages

In a previous blog post by Gustav Delius, 3 stages of building mizer models are described:

  1. Collect information about the important species in your ecosystem and how they are fished. This includes physiological parameters for the species as they might be found on fishbase, but also information about how abundant the species are and how they are being fished.

  2. Create a mizer model that in its steady state reproduces the time-averaged observed state of your fish community.

  3. Tune the model parameters further to also reproduce time-series observations that capture some of the system’s sensitivity to perturbations, like changes in fishing pressure.

This blog post will focus on the transition between stage 2 and 3. First, we wil use Gustav’s recipe (or constant reproduction trick) to get some initial steady state estimates, which I then use to

Example we will use

We will re-visit stage 2 first, this time using biomass estimates from stock assessments. Stock assessments are not a perfect choice, being derived from a combination of data and single species models, but they are often the best estimates of biomass available, and for our purpose. In future, I will update this blog to compare with survey-based biomass estimates (also not perfect).

Some of the functions we will be using are still in active development in the mizerExperimental package. Therefore we will always want to make sure we are loading the latest version of the package with

remotes::install_github("sizespectrum/mizerExperimental")
## Skipping install of 'mizerExperimental' from a github remote, the SHA1 (0d46d1e4) has not changed since last install.
##   Use `force = TRUE` to force installation
library(mizerExperimental)
## Loading required package: mizer
## 
## Attaching package: 'mizerExperimental'
## The following objects are masked from 'package:mizer':
## 
##     setBevertonHolt, steady

This blog post was compiled with mizer version 2.2.1 and mizerExperimental version 2.2.1.9002

We again will use the North Sea model, this time already set up with the biomass observations:

species_params <- readRDS("data-for-steady-to-time/ns_calib_param.RDS")

A recap on getting the steady state

There are many different ways to calibrate or tune models using time-averaged data. I usually do this in mizer by running the dynamics through time, outputting the time-averaged biomass or catches and calculating the differences with observations using a least-squares approach (usually optimisation, not always). This usually comes with challenges,which I won’t go into here, but it can get messy.

So I was excited to check out the constant reproduction trick, which I think could bypass a lot of fiddling around and saves the optimisation step for stage 3: fitting to times series data - when more data comes into play and we will get onto a little later.

The first step of stage 2 is very handy. Instead of setting up the model and projecting through time, we will use steady() to get the steady state with constant reproduction (and the adjusted the reproduction parameters, the reproductive efficiencies or erepro). It then sets the resulting steady state as the initial state of the MizerParams object.

This time we also use the species x specie interaction matrix, which was based on the spatial co-occurrences of each species, hence influences the strength of encounter interactions.

gear_params<-data.frame(species = species_params$species,
               gear = species_params$species,
               sel_func = "sigmoid_length",
               l25 =  c(7.6, 9.8, 8.7, 10.1, 11.5, 19.8, 16.4, 19.8, 11.5,
                        19.1, 13.2, 35.3),
               l50 = c(8.1, 11.8, 12.2, 20.8, 17.0, 29.0, 25.8, 29.0, 17.0,
                       24.3, 22.9, 43.6),
               catchability = rep(1,dim(species_params)[1]),
               initial_effort =species_params$catchability)

params <- newMultispeciesParams(species_params = species_params,gear_params=gear_params,inter,max_w=1e6) # inter comes with loading "mizer")
## Note: No h provided for some species, so using f0 and k_vb to calculate it.
## Note: Because you have n != p, the default value is not very good.
## Note: No ks column so calculating from critical feeding level.
## Note: Using z0 = z0pre * w_inf ^ z0exp for missing z0 values.
## Note: Using f0, h, lambda, kappa and the predation kernel to calculate gamma.
params <- steady(params)
## Convergence was achieved in 58.5 years.
plotlySpectra(params, power = 2)

We can then check this new params object does produce a steady state.

sim <- project(params,effort=gear_params$initial_effort, t_max = 5)
plot(sim)

Using the handy function ‘calibrateBiomass’ scales the system (background resource and other linked parameters) to get the total biomass in the model to match up with the total observed biomass of all 12 species. Then ‘matchBiomass’ calculates the scaling needed for each of the species’ biomasses to match up to their observed ones. This part is a bit of an iterative process: match biomass, calculate steady steady, re-check match to biomass and repeatt until satisfactory. It uses %>% from tidyverse to make this easy.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
params <- params %>% calibrateBiomass() %>%  matchBiomasses() %>%  steady()
## Convergence was achieved in 39 years.
plotBiomassVsSpecies(params)

Pretty good! Now let’s check we are at steady state and have a look to see if the size spectra look OK:

# note: for the growth curves, this step needed a sim object not params as in the blog, so below I have projected forward from steady to create a sim object
sim <- project(params,effort=gear_params$initial_effort,t_max = 10)
plotlySpectra(sim, power = 2, total = TRUE)

Most look reasonable as do the growth curves below, in comparison with those based purely on the empirical von Bertlanffy parameters. We would of course expect some variation here, with the modelling being driven by mechanistic processes involved in food dependent growth and size-at-age data typically highly variable. Ultimately, we need to re-examine these later with data as well.

plotGrowthCurves(sim, species_panel = TRUE)

We also wnat to check the diets look OK. For this we can use a handy plot in ‘mizerHowTo’, which is a package that has more in depth tutorials for you to play with some day, particularly for examining these diagnostics further and for hand-tuning, using shiny apps.

library(mizerHowTo)
## Loading required package: parallel
plotDiet2(sim)
## Warning: Removed 4876 rows containing missing values (position_stack).

We now have a steady state that we can use as a starting point for tunit the model through time. This involves considering density dependence in reproduction, in models.

Setting up density dependence in reproduction

In the above steps we have not incorporate the additional sources of density dependence not fully captured by the model. We can confirm this from:

reproduction_level <- getReproductionLevel(params)
reproduction_level
##   Sprat Sandeel  N.pout Herring     Dab Whiting    Sole Gurnard  Plaice Haddock 
##       0       0       0       0       0       0       0       0       0       0 
##     Cod  Saithe 
##       0       0

And we can double check to confirm that Rmax = “Inf”, and only erepro values are entered, for each species:

species_params(params)[, c("erepro", "R_max")]
##               erepro R_max
## Sprat   3.555930e-03   Inf
## Sandeel 4.834932e-05   Inf
## N.pout  4.079791e-04   Inf
## Herring 1.948775e-04   Inf
## Dab     1.501725e-04   Inf
## Whiting 1.922795e-04   Inf
## Sole    7.031367e-05   Inf
## Gurnard 6.099075e-04   Inf
## Plaice  4.251633e-03   Inf
## Haddock 8.024415e-05   Inf
## Cod     6.627646e-08   Inf
## Saithe  5.232517e-07   Inf

Additional unaccounted for density dependence could include habitat, are extra sources of mortality or reproductive success. These parameters along with erepro, the reproductive efficiencies, affect how species collapse (and recover) under changes in fishing through time.

We may wish to not include any additional density dependence, or we may wish to find out, what the best estimates of these values are, when the model is fit to time series data. This is what we will do below, this time using catch data through time, and time-varying fishing mortality rates as inputs (derived from stock assessments). In a later post I will show how we can avoid the circularity of stock assessment inputs and use effort information directly. For now, they suit our purpose.

First, we need to assume some level of density dependence to get started. Arbritrality I will choose 0.9 to get some values, and we can also have a look at how this appears in the stock-recruitment function:

params2 <- setBevertonHolt(params, reproduction_level = 0.9)

plotBevertonHolt2 <- function(params, params2, species) {
  select <- species_params(params)$species == species
  erepro <- species_params(params)$erepro[select]
  w0 <- params@w[params@w_min_idx[select]]
  E_R_ss <- getRDI(params)[select] / erepro * 2 * w0
  R_dd_ss <- getRDD(params)[select]
  E_R <- seq(0, 2 * E_R_ss, length.out = 50)
  
  R_max  <- species_params(params)$R_max[select]
  R_di = erepro * E_R / 2 / w0
  R_dd <- R_di / (1 + R_di / R_max)
  df <- melt(data.frame(E_R, R_dd, R_di, R_max), id.vars = "E_R")
  df$Model <- "Model 1"
  
  erepro <- species_params(params2)$erepro[select]
  R_max  <- species_params(params2)$R_max[select]
  R_di = erepro * E_R / 2 / w0
  R_dd <- R_di / (1 + R_di / R_max)
  df2 <- melt(data.frame(E_R, R_dd, R_di, R_max), id.vars = "E_R")
  df2$Model <- "Model 2"
  
  ggplot(rbind(df, df2)) +
    geom_line(aes(x = E_R, y = value, linetype = variable,
                  colour = Model, size = Model)) +
    geom_point(aes(x = E_R_ss, y = R_dd_ss), size = 3, color = "red") +
    ylim(NA, 1.1 * R_max) +
    ylab("Reproduction rate [eggs/year]") +
    xlab("Energy invested [g/year]") +
    labs(linetype = "", size = "R_max", colour = "R_max") +
    scale_size_manual(values = c(0.5, 1)) +
    scale_colour_manual(values = c("blue", "black")) +
    scale_linetype_manual(values = c("solid", "dashed", "dotted"))
}

plotBevertonHolt2(params, params2, "Cod")
## Warning: Removed 87 row(s) containing missing values (geom_path).

We can re-run the model to check it still reaches a steady state:

# re-run to check it
params2 <- steady(params2)
## Convergence was achieved in 1.5 years.
sim <- project(params2,effort=gear_params$initial_effort, t_max = 100)
plot(sim)

Changes in species’ fishing mortality rates through time

Next, we will read in fishing mortality rate time series. These are fairly recent ICES stock assessment fishing mortality estimates, here used as model inputs through time.

Let’s incorporate this time-varying fishing into the model.

# read in stored fishing mortality (here called effort) time series
effort <- readRDS("data-for-steady-to-time/effortTime.RDS")
simt<- project(params2, effort = effort)
plotYieldGear(simt)

plotlyBiomass(simt)

Here, we are interested in examining the changes along side observations. Let’s read in some observed landings for the North Sea and add these to our plot.

#read in observed yield values (again need to update these data from ICES)
obsy <- as.matrix(read.csv("data-for-steady-to-time/catchesMat.csv")[1:73,])
rownames(obsy)<-obsy[,1] 
obsy <-reshape2::melt(obsy[,-1])
names(obsy)<-c("time","sp","value")

source("calibration_functions.R")
plotFittedTime(simt,obsy)
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 334 rows containing missing values (geom_point).

Currently, the modelled yields are nowhere near the data time series. Ideally we would like the lines to pass through the cloud of points for each species….

One issue could be that the erepro values we just re-calibrated the model make the species much more reactive to fishing. Let’s examine how sensitive the time series (and their visual agreement to data look when we change our assumptions about eRepro, and possibly Rmax).

Given we have much more data now than in the time averaged calibration we can estimate more parameters. Because erepro influences how stocks respond to fishing this is an appropraite one to include, but only when Rmax is also being re-calibrated. In this example we also can re-estimated kappa and r_pp to control the background resource spectrum.

Other parameters could be estimated here, such as the fishing mortality parameters. Also Bayesian parameter uncertainty work has done this with Rmax (Spence et al 2016) and time-varying Fs (Spence et al 2021) using this North Sea model.

Optimistaion is computationally intensive so we will run the below setting up multiple computer cores and using optimParallel. The function getErrorTime is user defined and it runs the model and calculates the sum of squared errors between the observed and modelled catches for all species’ time series. If you want to see how to write the error function it is in the file “calibration_fuctions.R” and can be examined here:

# some starting values - using log-scale for Rmax and kappa - and combine into a single vector
vary<-c(log10(params2@species_params$R_max),params2@species_params$erepro,params2@resource_params$kappa,params2@resource_params$r_pp)

We will pass this function to optimParallel to estimate the lowest sum of squared errors between the observed and modelled catches for all species’ time series. If you wat to see how to write the error function it is in the file “calibration_fuctions.R” called “getErrorTime()”.

Now, to run in parallel, the first set up a cluster of multiple computer cores to run model in parallel. I have commented this out as it takes a looooong time to run (so, depending on your computer you might want to do it while you take a break). For this example, we will just read in our previously saved the results.

# SHOULD I JUST WRAP THIS INTO A FUNCTION - optimMizer? Or keep open so 
# library(parallel)
# library(optimParallel)
# 
# noCores <- detectCores() - 1 # keep a spare core
#  cl <- makeCluster(noCores, setup_timeout = 0.5)
#    setDefaultCluster(cl = cl)
#    clusterExport(cl, as.list(ls()))
#    clusterEvalQ(cl, {
#      library(mizerExperimental)
#      library(optimParallel)
#    })
   
 # run the optimisation  - note the lower and upper ranges were enter manually and will need to be different if you change parameters or model for your own example
# optim_result <-optimParallel(par=vary,getErrorTime,params=params2, dat = obsy, method   ="L-BFGS-B",lower=c(rep(3,12),rep(1e-3,12),3,1),upper= c(rep(15,12),rep(1,12),15,10),parallel=list(loginfo=TRUE, forward=TRUE))
#    
# stopCluster(cl)
# # save results  
# saveRDS(optim_result, file="optim_result_timeseries_GD.RDS")

optim_result<-readRDS("data-for-steady-to-time/optim_result_timeseries.RDS")

NOTE: these are not actually the most recent optimisation runs - using here just to show what the blog would show

Now let’s plug these back in and take a look at the plots….

#put these new vals intospecies_params and go back to the top of this page to re-check the calibration 


species_params(params2)$R_max<-10^optim_result$par[1:12]
species_params(params2)$erepro<-optim_result$par[13:24]
resource_params(params2)$kappa<-10^optim_result$par[25]
resource_params(params2)$r_pp<-optim_result$par[26]

#re-run time-varying effort model tthough time with new erepro
sim_opt <- project(params2, effort = effort)

plotFittedTime(sim_opt,obsy)
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 334 rows containing missing values (geom_point).

plotFittedTime(sim_opt,obsy,allSpecies = F,plotSpecies = "Cod")
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 16 rows containing missing values (geom_point).

#### also can check how these new params perform in terms of the equlibrium yield curves
plotYieldVsF(params2, "Sprat")
## Convergence was achieved in 78 years.
## Convergence was achieved in 13.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 3 years.
## Convergence was achieved in 3 years.
## Convergence was achieved in 3 years.
## Convergence was achieved in 3 years.
## Convergence was achieved in 3 years.

getYieldVsF(params2, "Gurnard")
## Convergence was achieved in 78 years.
## Convergence was achieved in 13.5 years.
## Convergence was achieved in 6 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 4.5 years.
## Convergence was achieved in 3 years.
##            F      yield
## 1  0.0000000          0
## 2  0.3333333 3589238772
## 3  0.6666667 5050357621
## 4  1.0000000 5716136763
## 5  1.3333333 5936144170
## 6  1.6666667 5921773988
## 7  2.0000000 5779283145
## 8  2.3333333 5575827473
## 9  2.6666667 5352235526
## 10 3.0000000 5148049556
getYieldVsF(params2, "Cod")
## Convergence was achieved in 78 years.
## Convergence was achieved in 61.5 years.
## Convergence was achieved in 9 years.
## Convergence was achieved in 9 years.
## Convergence was achieved in 7.5 years.
## Convergence was achieved in 6 years.
## Convergence was achieved in 6 years.
## Convergence was achieved in 6 years.
## Convergence was achieved in 6 years.
## Convergence was achieved in 4.5 years.
##            F        yield
## 1  0.0000000            0
## 2  0.3333333 127160883950
## 3  0.6666667  76713651167
## 4  1.0000000  48129261971
## 5  1.3333333  34191476056
## 6  1.6666667  26675850579
## 7  2.0000000  22001665564
## 8  2.3333333  18779880071
## 9  2.6666667  16339597278
## 10 3.0000000  14350213929

Interpretation of plots. Next steps.

Summary

Gustav: I will complete the above and add this section if, after you have a look, you are happy to post this blog, I can edit further , change & shorten some sections of course if you have suggestions.